A multi-display, switched video (“telepresence”) conferencing system is a system of components and endpoint devices that create a live, face-to-face meeting experience over a network that allows users to interact and collaborate in such a way that is seems as if all remote participants are present in the same room. Existing telepresence solutions, for example, combine life-size, true high-definition video images, CD-quality audio, a specially designed environment, and interactive elements to create the feeling of being “in person” at a virtual table with participants from remote locations. Some commercially-available telepresence systems are designed for small group meetings and one-on-one conversations, while others are designed for larger group meetings. Regardless of the size of the conference or meeting, the endpoints typically work in conjunction with a manager software application, which provides call scheduling, setup.
The goal of a telepresence conferencing system is to allow the participants to focus on the meeting, not the technology, and thus communicate naturally and effectively. One way that this is accomplished is by having the audio directionality track the video display, e.g., locating a loudspeaker adjacent each video display. The idea, in other words, is to have audio for a speaking participant come out of the loudspeaker adjacent to where the participant's image is being displayed. A problem arises, however, when a change to the display is triggered by a new speaking participant. In such cases, the audio from the new speaking participant usually precedes the video switching operation by a couple of seconds, due to the need to prevent video images from thrashing based on short talk spurts. Since the video switching system has yet to start rendering the video on the display, oftentimes the audio gets rendered in the wrong place (i.e., loudspeaker location). When the video is eventually displayed, the audio may abruptly jump from one loudspeaker to another, causing distracting artifacts that may disorient the participants or disrupt the virtual table experience.